add otel exports by kinjalh · Pull Request #3373 · pluralsh/console

kinjalh · 2026-03-30T19:34:39Z

Add OTel exports

Test Plan

Unit tests

Checklist

If required, I have updated the Plural documentation accordingly.
I have added tests to cover my changes.
I have added a meaningful title and summary to convey the impact of this PR to a user.

Plural Flow: console

michaeljguarino

this would need graphql exposure of the deployment_settings metrics fields, and also integration into the k8s operator too. You can probably get cursor agent to do it for you tbh.

michaeljguarino · 2026-03-30T20:41:08Z

lib/console/otel/exporter.ex

@@ -0,0 +1,95 @@
+defmodule Console.Otel.Exporter do


Is there no library that handles this at the moment? There might not be, but would be nice to use if so

I think existing library opentelemetry_exporter is still dev status (not guaranteed to be stable)?

michaeljguarino · 2026-03-30T20:41:42Z

lib/console/otel/metrics_exporter.ex

+  """
+  @spec maybe_export_metrics() :: :ok | {:error, term}
+  def maybe_export_metrics do
+    case Settings.fetch() do


convert this into a with expression (almost all nested cases can become them)

michaeljguarino · 2026-03-30T20:41:58Z

lib/console/otel/cron.ex

+  Use this for direct invocation (testing, manual runs).
+  """
+  @spec export_metrics() :: :ok | {:error, term}
+  def export_metrics do


is this still needed?

michaeljguarino · 2026-03-30T20:44:18Z

lib/console/otel/cron.ex

+    end
+  end
+
+  defp should_run?(crontab) do


i don't think this will work w/o some statefulness (you need to keep around the last run date somewhere to compute the next run time, we do that everywhere in the db-bound crons).

A way to simulate that would be to convert this into a genserver, keep the last run time in genserver state, and only run the metrics export if the node is a determined leader (use the Console.ClusterRing module to determine that)

michaeljguarino · 2026-04-02T15:09:46Z

lib/console/otel/metrics_exporter.ex

+      []
+      |> Stream.concat(build_service_metrics(timestamp))
+      |> Stream.concat(build_cluster_metrics(timestamp))
+      |> Enum.to_list()


doing an enum.to_list eliminates the whole purpose of the stream (maintaining O(1) memory usage). What you should do here is instead pipe the concat'ed stream into Stream.chunk_every(100 or some other chunk size, and then pipe that to the exporter.

michaeljguarino · 2026-04-02T15:10:10Z

lib/console/otel/metrics_exporter.ex

+    |> Stream.flat_map(&build_cluster_metrics_list(&1, timestamp))
+  end
+
+  defp build_service_metric(%Service{} = service, timestamp) do


mostly a style nit, but i'd consider moving this metric formatting code to another module

michaeljguarino · 2026-04-02T15:12:56Z

lib/console/otel/metrics_exporter.ex

+                    |> DateTime.to_naive()
+                    |> NaiveDateTime.add(60, :second)
+
+        next = Crontab.Scheduler.get_next_run_date!(expr, base_time)


i'd generally make these a with expression with {:ok, _} tuple matches down the line so you don't needlessly crash the genserver on bad input

michaeljguarino

looks mostly fine, minus a few nits

michaeljguarino · 2026-04-04T02:25:25Z

lib/console/otel/metrics_exporter.ex

+  defp do_export(endpoint) do
+    timestamp = DateTime.utc_now()
+
+    Repo.transaction(fn ->


don't worry about the transaction here, it's read-only so no need to roll back

michaeljguarino · 2026-04-04T02:26:12Z

lib/console/otel/metrics_exporter.ex

+            count
+        end
+      end)
+      |> case do


can probably just make this |> then(&Logger.info("Exported #{&1} metrics to #{enpoint}"). It's not hard to no 0 metrics implies it didn't export

michaeljguarino

one thing that still isn't here is the kubernetes CRD definition for setting these fields. You need to:

Expose the inputs in gql so the client can modify them
Modify the crd type
Modify the controller to wire it up

Oftentimes i can just get ai to do this for me now, but you probably need to familiarize yourself enough with the structure to prompt it well

greptile-apps · 2026-04-08T14:52:36Z

Greptile Summary

This PR adds an OpenTelemetry metrics export pipeline: a new MetricsExporter GenServer that reads cluster/service health from the DB on a configurable crontab and POSTs OTLP/JSON gauge payloads to a collector endpoint. The feature is gated by DeploymentSettings.metrics, which is added as an embedded schema with crontab validation, a DB migration, GraphQL types, and a Go CRD type for the controller.

P1 — silent export failure: Repo.transaction/1's return value is discarded in do_export/1, so a DB-level failure (connection error, lock timeout) produces no error log and last_run_at is still updated as if the export succeeded, making the failure invisible.

Confidence Score: 4/5

Safe to merge after addressing the silent transaction failure in do_export/1.

One P1 defect: DB transaction errors in do_export are silently swallowed, last_run_at is updated regardless, and no error is logged — making export failures invisible. The remaining findings are P2 (unused struct field, dead Stream.reject, minor leader-check race). The core pipeline logic, schema validation, migration, GraphQL types, and Go CRD are all well-structured and well-tested.

lib/console/otel/metrics_exporter.ex — transaction error handling and check_ref cleanup

Vulnerabilities

No security concerns identified. The OTel endpoint is stored as a plain string (no auth credentials), which limits authentication options but introduces no new vulnerabilities. Sensitive fields in DeploymentSettings that already exist (passwords, tokens) continue to use EncryptedString; the new metrics embed contains no secret fields.

Important Files Changed

Filename	Overview
lib/console/otel/metrics_exporter.ex	New GenServer that schedules and drives OTel metric exports; has a silent transaction failure bug and a minor leader-check race in the :export handler.
lib/console/otel/exporter.ex	HTTP OTLP/JSON exporter using Req; cleanly formats metrics and handles errors.
lib/console/otel/metrics_builder.ex	Pure functions building OTel metric maps from Cluster/Service records; has a harmless dead Stream.reject call.
lib/console/schema/deployment_settings.ex	Adds embedded Metrics schema with enabled/endpoint/crontab fields and proper validation (required-if-enabled + cron syntax check).
priv/repo/migrations/20250429000000_add_metrics_export_settings.exs	Adds :map column for metrics embedded schema to deployment_settings table.
go/controller/api/v1alpha1/deploymentsettings_types.go	Adds MetricsSettings CRD type with kubebuilder markers and an Attributes() conversion method wired into the controller.
lib/console/graphql/deployments/settings.ex	Adds MetricsSettings input/output GraphQL types and wires them into the DeploymentSettings object and mutation input.
lib/console/application.ex	Registers Console.Otel.MetricsExporter as a supervised child process.

_{Reviews (1): Last reviewed commit: "add otel exports" | Re-trigger Greptile}

greptile-apps · 2026-04-08T14:52:39Z

lib/console/otel/metrics_exporter.ex

+  defp do_export(endpoint) do
+    timestamp = DateTime.utc_now()
+
+    # Transaction is required for Repo.stream cursor support
+    Repo.transaction(fn ->
+      []
+      |> Stream.concat(MetricsBuilder.service_metrics_stream(timestamp))
+      |> Stream.concat(MetricsBuilder.cluster_metrics_stream(timestamp))
+      |> Stream.chunk_every(@chunk_size)
+      |> Enum.reduce(0, fn chunk, count ->
+        case Exporter.export(endpoint, chunk) do
+          :ok ->
+            count + length(chunk)
+
+          {:error, reason} ->
+            Logger.error("Failed to export chunk: #{inspect(reason)}")
+            count
+        end
+      end)
+      |> then(&Logger.info("Exported #{&1} metrics to #{endpoint}"))
+    end)
+
+    :ok
+  end


Transaction errors are silently swallowed

Repo.transaction/1 returns {:ok, result} or {:error, reason}, but the return value is discarded and do_export/1 unconditionally returns :ok. If the database transaction fails (connection error, lock timeout, etc.), there is no error log and the caller in handle_info(:export) still updates last_run_at as if the export succeeded — masking the failure entirely.

defp do_export(endpoint) do timestamp = DateTime.utc_now() result = Repo.transaction(fn -> [] |> Stream.concat(MetricsBuilder.service_metrics_stream(timestamp)) |> Stream.concat(MetricsBuilder.cluster_metrics_stream(timestamp)) |> Stream.chunk_every(@chunk_size) |> Enum.reduce(0, fn chunk, count -> case Exporter.export(endpoint, chunk) do :ok -> count + length(chunk) {:error, reason} -> Logger.error("Failed to export chunk: #{inspect(reason)}") count end end) |> then(&Logger.info("Exported #{&1} metrics to #{endpoint}")) end) case result do {:ok, _} -> :ok {:error, reason} -> Logger.error("Metrics export transaction failed: #{inspect(reason)}") {:error, reason} end end

And in handle_info(:export), only update last_run_at on :ok to allow a retry on the next schedule cycle.

greptile-apps · 2026-04-08T14:52:41Z

lib/console/otel/metrics_exporter.ex

+  defmodule State do
+    defstruct [:last_run_at, :timer_ref, :check_ref]
+  end


check_ref field is never populated

State declares a check_ref field but :timer.send_interval/2 is called in init/1 without capturing its return value, so check_ref is always nil. The field is unused throughout the module and can be removed to avoid confusion.

Suggested change

defmodule State do

defstruct [:last_run_at, :timer_ref, :check_ref]

end

defstruct [:last_run_at, :timer_ref]

greptile-apps · 2026-04-08T14:52:43Z

lib/console/otel/metrics_exporter.ex

+  def handle_info(:export, state) do
+    case get_config() do
+      {:ok, %{endpoint: endpoint}} ->
+        do_export(endpoint)
+        {:noreply, %{state | last_run_at: DateTime.utc_now(), timer_ref: nil}}
+
+      _ ->
+        {:noreply, %{state | timer_ref: nil}}
+    end
+  end


Leader check skipped in :export handler

handle_info(:export, state) only checks whether a config is present — it does not verify that this node is still the leader. If :export is already in the process mailbox when handle_info(:check) cancels the timer due to a leadership change (the timer has already fired so Process.cancel_timer returns 0 but the message is still queued), the export will run on the non-leader node, producing duplicate metric data points.

Consider adding a leader guard:

def handle_info(:export, state) do case {leader?(), get_config()} do {true, {:ok, %{endpoint: endpoint}}} -> do_export(endpoint) {:noreply, %{state | last_run_at: DateTime.utc_now(), timer_ref: nil}} _ -> {:noreply, %{state | timer_ref: nil}} end end

greptile-apps · 2026-04-08T14:52:44Z

lib/console/otel/metrics_builder.ex

+    |> Repo.stream(method: :keyset)
+    |> Console.throttle(count: 500, pause: 50)
+    |> Stream.map(&build_service_metric(&1, timestamp))
+    |> Stream.reject(&is_nil/1)


Stream.reject(&is_nil/1) is dead code

build_service_metric/2 always returns a %{name: ..., value: ..., ...} map and never returns nil, so this rejection step has no effect. It can safely be removed.

Suggested change

|> Stream.reject(&is_nil/1)

|> Stream.map(&build_service_metric(&1, timestamp))

kinjalh added the enhancement New feature or request label Mar 30, 2026

michaeljguarino reviewed Mar 30, 2026

View reviewed changes

kinjalh force-pushed the otel-exports branch from 852204c to 43eae6f Compare April 2, 2026 14:36

michaeljguarino reviewed Apr 2, 2026

View reviewed changes

kinjalh force-pushed the otel-exports branch from 43eae6f to 2d1784a Compare April 3, 2026 14:22

michaeljguarino approved these changes Apr 4, 2026

View reviewed changes

michaeljguarino reviewed Apr 4, 2026

View reviewed changes

kinjalh force-pushed the otel-exports branch 3 times, most recently from 85dfc18 to 572fa02 Compare April 7, 2026 18:59

kinjalh marked this pull request as ready for review April 8, 2026 14:47

greptile-apps bot reviewed Apr 8, 2026

View reviewed changes

kinjalh and others added 2 commits April 8, 2026 16:13

add otel exports

c53bd1e

drop transaction wrapper

1aa8190

michaeljguarino force-pushed the otel-exports branch from 572fa02 to 1aa8190 Compare April 8, 2026 20:15

small tweaks

50cfed0

	\|> Stream.reject(&is_nil/1)
	\|> Stream.map(&build_service_metric(&1, timestamp))

Conversation

kinjalh commented Mar 30, 2026

Test Plan

Checklist

Uh oh!

michaeljguarino left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaeljguarino left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaeljguarino left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Apr 8, 2026

Greptile Summary

Confidence Score: 4/5

Vulnerabilities

Important Files Changed

Uh oh!

greptile-apps bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michaeljguarino left a comment •

edited

Loading